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ABSTRACT 

The process of rank aggregation is intimately intertwined 
with the structure of skew-symmetric matrices. We apply 
recent advances in the theory and algorithms of matrix com- 
pletion to skew-symmetric matrices. This combination of 
ideas produces a new method for ranking a set of items. 
The essence of our idea is that a rank aggregation describes 
a partially filled skew-symmetric matrix. We extend an algo- 
rithm for matrix completion to handle skew-symmetric data 
and use that to extract ranks for each item. Our algorithm 
applies to both pairwise comparison and rating data. Be- 
cause it is based on matrix completion, it is robust to both 
noise and incomplete data. We show a formal recovery re- 
sult for the noiseless case and present a detailed study of the 
algorithm on synthetic data and Netflix ratings. 
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1. INTRODUCTION 

One of the classic data mining problems is to identify the 
important items in a data set; see Tan and Jin [2004] for 
an interesting example of how these might be used. For this 
task, we are concerned with rank aggregation. Given a series 
of votes on a set of items by a group of voters, rank aggre- 
gation is the process of permuting the set of items so that 
the first element is the best choice in the set, the second 
element is the next best choice, and so on. In fact, rank 
aggregation is an old problem and has a history stretch- 
ing back centuries [Condorcet, 1785]; one famous result is 
that any rank aggregation requires some degree of compro- 
mise [Arrow, 1950]. Our point in this introduction is not to 
detail a history of all the possible methods of rank aggrega- 
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tion, but to give some perspective on our approach to the 
problem. 

Direct approaches involve finding a permutation explic- 
itly - for example, computing the Kemeny optimal ranking 
[Kemeny, 1959] or the minimum feedback arc set problem. 
These problems are NP-hard [Dwork et al., 2001, Ailon et al., 
2005, Alon, 2006]. An alternate approach is to assign a score 
to each item, and then compute a permutation based on or- 
dering these items by their score, e.g. Saaty [1987]. In this 
manuscript, we focus on the second approach. A key advan- 
tage of the computations we propose is that they are convex 
problems and efficiently solvable. 

While the problem of rank aggregation is old, modern ap- 
plications - such as those found in web-applications like Net- 
fiix and Amazon - pose new challenges. First, the data col- 
lected are usually cardinal measurements on the quality of 
each item, such as 1-5 stars, received from voters. Second, 
the voters are neither experts in the rating domain nor ex- 
perts at producing useful ratings. These properties manifest 
themselves in a few ways, including skewed and indiscrim- 
inate voting behaviors [Ho and Quinn, 2008]. We focus on 
using aggregate pairwise data about items to develop a score 
for each item that predicts the pairwise data itself. This ap- 
proach eliminates some of the issues with directly utilizing 
voters ratings, and we argue this point more precisely in 
Section 2. 

To explain our method, consider a set of n items, labeled 
from 1 to n. Suppose that each of these items has an un- 
known intrinsic quality Si : 1 < i < n, where Si > Sj implies 
that item i is better than item j. While the Si's are unknown, 
suppose we are given a matrix Y where Yij = Si — sj. By 
finding a rank-2 factorization of Y, for example 



Y = se' 



(1) 



we can extract unknown scores. The matrix Y is skew- 
symmetric and describes any score-based global pairwise 
ranking. (There are other possible rank-2 factorizations of 
a skew-symmetric matrix, a point we return to later in Sec- 
tion 3.1). 

Thus, given a measured Y, the goal is to find a mini- 
mum rank approximation of Y that models the elements, 
and ideally one that is rank-2. Phrased in this way, it is a 
natural candidate for recent developments in the theory of 
matrix completion [Candes and Tao, to appear, Recht et al., 
to appear]. In the matrix completion problem, certain ele- 
ments of the matrix are presumed to be known. The goal is 
to produce a low-rank matrix that respects these elements - 
or at least minimizes the deviation from the known elements. 
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Table 1: Notation for the paper. 


tjy 111. 


TntpTTlTptFltlOTl 
XAiLfd^i K^-^ u<n. Lf 1.1 


A{-) 


a linear map from a matrix to a vector 


e 


a vector of all ones 


ei 


a vector with 1 in the ith entry, elsewhere 


ll-IL 


the nuclear norm 


R 


a rating matrix (voters-by-items) 


Y 


a fitted or model pairwise comparison matrix 


Y 


a measured pairwise comparison matrix 




an index set for the known entries of a matrix 



One catch, however, is that we require matrix completion 
over skew-symmetric matrices for pairwise ranking matrices. 
Thus, we must solve the matrix completion problem inside 
a structured class of matrices. This task is a novel contri- 
bution of our work. Recently, Gross [2010] also developed a 
technique for matrix completion with Hermitian matrices. 

With a "completed" matrix Y, the norm of the residual 
II V — V|| gives us a certificate for the validity of our fit - an 
additional piece of information available in this model. 

To continue, we briefly summarize our main contributions 
and our notational conventions. 

Our contributions. 

• We propose a new method for computing a rank aggre- 
gation based on matrix completion, which is tolerant 
to noise and incomplete data. 

• We solve a structured matrix-completion problem over 
the space of skew-symmetric matrices. 

• We prove a recovery theorem detailing when our ap- 
proach will work. 

• We perform a detailed evaluation of our approach with 
synthetic data and an anecdotal study with Netflix rat- 
ings. 

Notation. 

We try to follow standard notation conventions. Matrices 
are bold, upright roman letters, vectors are bold, lowercase 
roman letters, and scalars are unbolded roman or Greek let- 
ters. The vector e consists of all ones, and the vector has 
a 1 in the ith position and O's elsewhere. Linear maps on ma- 
trices are written as script letters. An index set 51 is a group 
of index pairs. Each a; G f2 is a pair (r, s) and we assume 
that the lo's are numbered arbitrarily, i.e. f2 = {coi, . . . , cj^}. 
Please refer to Table 1 for reference. 

Before proceeding further, let us outline the rest of the pa- 
per. First, Section 2 describes a few methods to take voter- 
item ratings and produce an aggregate pairwise comparison 
matrix. Additionally, we argue why pairwise aggregation is 
a superior technique when the goal is to produce an ordered 
list of the alternatives. Next, in Section 3, we describe for- 
mulations of the noisy matrix completion problem using the 
nuclear norm. In our setting, the LASSO formulation is the 
best choice, and we use it throughout the remainder. We 
briefly describe algorithms for matrix completion and focus 
on the SVP algorithm [Jain et al., 2010] in Section 3.1. We 
then show that the SVP algorithm preserves skew-symmetric 
structure. This process involves studying the singular value 



decomposition of skew-symmetric matrices. Thus, by the 
end of the section, we've shown how to formulate and solve 
for a scoring vector based on the nuclear norm. The follow- 
ing sections describe alternative approaches and show our 
recovery results. At the end, we show our experimental re- 
sults. In summary, our overall methodology is 

Ratings (= R) 
^ (§2) 

Pairwise comparisons (= Y) 
4 (§3) 
Ranking scores (= s) 
4 (sorting) 
Rank aggregations. 

An example of our rank aggregations is given in Table 2. We 
comment further on these in Section 6.3. 

Finally, we provide our computational and experimental 
codes so that others may reproduce our results: 
https : //dgleich. com/projects/ skew-nuclear 

2. PAIRWISE AGGREGATION METHODS 

To begin, we describe methods to aggregate the votes of 
many voters, given by the matrix R, into a measured pair- 
wise comparison matrix Y . These methods have been well- 
studied in statistics [David, 1988]. In the next section, we 
show how to extract a score for each item from the matrix 
Y. 

Let ii be a voter-by-item matrix. This matrix has m 
rows corresponding to each of the m voters and n columns 
corresponding to the n items of the dataset. In all of the 
applications we explore, the matrix R is highly incomplete. 
That is, only a few items are rated by each voter. Usually 
all the items have a few votes, but there is no consistency in 
the number of ratings per item. 

Instead of using R directly, we compute a pairwise aggre- 
gation. Pairwise comparisons have a lengthy history, dating 
back to the first half of the previous century [Kendall and Smith, 
1940]. They also have many nice properties. First, Miller 
[1956] observes that most people can evaluate only 5 to 9 
alternatives at a time. This fact may relate to the com- 
mon choice of a 5-star rating (e.g. the ones used by Amazon, 
eBay, Netflix, YouTube). Thus, comparing pairs of movies 
is easier than ranking a set of 20 movies. Furthermore, only 
pairwise comparisons are possible in certain settings such as 
tennis tournaments. Pairwise comparison methods are thus 
natural for analyzing ranking data. Second, pairwise com- 
parisons are a relative measure and help reduce bias from the 
rating scale. For these reasons, pairwise comparison meth- 
ods have been popular in psychology, statistics, and social 
choice theory [David, 1988, Arrow, 1950]. Such methods 
have also been adopted by the learning to rank community; 
see the contents of Li et al. [2008] . A flnal advantage of pair- 
wise methods is that they are much more complete than the 
ratings matrix. For Netflix, R is 99% incomplete, whereas Y 
is only 0.22% incomplete and most entries are supported by 
many comparisons. See Figure 1 for information about the 
number of pairwise comparisons in Netflix and MovieLens. 

More critically, an incomplete array of user-by-product 
ratings is a strange matrix - not every 2-dimensional array 
of numbers is best viewed as a matrix - and using the rank of 
this matrix (or its convex relaxation) as a key feature in the 
modeling needs to be done with care. Consider, if instead of 
rating values 1 to 5, to 4 are used to represent the exact 



2 



Table 2: The top 15 movies from Netflix generated by our ranking method (middle and right). The left list is 
the ranking using the mean rating of each movie and is emblematic of the problems global ranking methods 
face when infrequently compared items rocket to the top. We prefer the middle and right lists. See Section 6 
and Figure 4 for information about the conditions and additional discussion. LOTR III appears twice because 
of the two DVDs editions, theatrical and extended. 



Mean Log-odds (all) Arithmetic Mean (30) 



T OTR TTT- Rafiirn 

ij\J 1 l\ 111. iXcLUlll . . . 


-Livy J- XX ill. XxeiLlili . . . 


T OTT? TTT- T?Bfiirn 
Ljyj ± sx 111. iteLUin . . . 
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T OTP T- Thp Ppllnwshin 


T OTP T- TVip FpllowsViin 

IjW J- XIj i, _L lit J7 tilU W Oliip . . . 


LOTR II: The Two . . . 


LOTR II: The Two . . . 


LOTR II: The Two . . . 


Lost: Season 1 


Star Wars V: Empire . . . 


Lost: SI 


Battlestar Galactica: SI 


Raiders of the Lost Ark 


Star Wars V: Empire . . . 


FuUmetal Alchemist 


Star Wars IV: A New Hope 


Battlestar Galactica: SI 


Trailer Park Boys: S4 


Shawshank Redemption 


Star Wars IV: A New Hope 


Trailer Park Boys: S3 


Star Wars VI: Return ... 


LOTR HI: Return . . . 


Tenchi Muyo! . . . 


LOTR III: Return . . . 


Raiders of the Lost Ark 


Shawshank Redemption 


The Godfather 


The Godfather 


Veronica Mars: SI 


Toy Story 


Shawshank Redemption 


Ghost in the Shell: S2 


Lost: SI 


Star Wars VI: Return ... 


Arrested Development: S2 


Schindler's List 


Gladiator 


Simpsons: S6 


Finding Nemo 


Simpsons: S5 


Inu-Yasha 


CSI: S4 


Schindler's List 



same information, the rank of this new rating matrix will 
change. Furthermore, whether we use a rating scale where 
1 is the best rating and 5 is worst, or one where 5 is the 
best and 1 is the worst, a low-rank model would give the 
exact same fit with the same input values, even though the 
connotations of the numbers is reversed. 

On the other hand, the pairwise ranking matrix that we 
construct below is invariant under monotone transformation 
of the rating values and depends only on the degree of rel- 
ative preference of one alternative over another. It circum- 
vents the previously mentioned pitfalls and is a more princi- 
pled way to employ a rank/nuclear norm model. 

We now describe five techniques to build an aggregate 
pairwise matrix Y from the rating matrix R. Let a denote 
the index of a voter, and i and j the indices of two items. 
The entries of R are Rd- To each voter, we associate a 
pairwise comparison matrix Y . The aggregation is usually 
computed by something like a mean over Y . 

1. Arithmetic mean of score differences The score 
difference is Y^'j — Raj — Rd- The arithmetic mean of 
all voters who have rated both i and j is 

^ ^ - Raj) 

#{a 1 Rai, Raj exist} ' 

These comparisons are translation invariant. 

2. Geometric mean of score ratios Assuming R > 
0, the score ratio refers to Yi" = Raj /Rai- The (log) 
geometric mean over all voters who have rated both i 
and j is 

y, ^ Ea(lQg-^ai - log-Rcj) 
#{a I Rai, Raj exist} 

These are scale invariant. 

3. Binary comparison Here Vj" = sign(i?Qj — 7?^^). 
Its average is the probability difference that the alter- 
native j is preferred to i than vice versa 

Yij = Pr{Q 1 Ra, > Rak}- Pr{a \ Ra. < Raj}. 

These are invariant to a monotone transformation. 



4. Strict binary comparison This method is almost 
the same as the last method, except that we eliminate 
cases where users rated movies equally. That is, 

!1 Rai > Raj 

Rai — Rdj 
— 1 Rai < Raj ■ 

Again, the average Yij has a similar interpretation to 
binary comparison, but only among people who ex- 
pressed a strict preference for one item over the other. 
Equal ratings are ignored. 

5. Logarithmic odds ratio This idea translates bi- 
nary comparison to a logarithmic scale: 

^ ^ Pr{a I Rar > Raj} 

" °^Pr{a I Ra^ <Rc.j}' 

3. RANK AGGREGATION WITH THE 
NUCLEAR NORM 

Thus far, we have seen how to compute an aggregate pair- 
wise matrix Y from ratings data. While Y has fewer missing 
entries than R - roughly 1-80% missing instead of almost 
99% missing - it is still not nearly complete. In this section, 
we discuss how to use the theory of matrix completion to es- 
timate the scoring vector underlying the comparison matrix 
Y. These same techniques apply even when Y is not com- 
puted from ratings and is measured through direct pairwise 
comparisons. 

Let us now state the matrix completion problem formally 
[Candes and Recht, 2009, Recht et al., to appear]. Given a 
matrix A where only a subset of the entries are known, the 
goal is to find the lowest rank matrix X that agrees with A 
in all the non-zeros. Let f2 be the index set corresponding to 
the known entries of A. Now define A{X) as a linear map 
corresponding to the elements of Q, i.e. A(X) is a vector 
where the ith element is defined to be 

[AiX)],=X^^, (2) 
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Number of Pairwise Comparisons Number of Pairwise Comparisons 

(a) MovieLens - 85.49% of total pairwise comparisons (b) Netflix - 99.77% of total pairwise comparisons 

Figure 1: A histogram of the number of pairwise comparisons between movies in MovieLens (left) and 
Netflix (right). The number of pairwise comparisons is the number of users with ratings on both movies. 
These histograms show that most items have more than a small number of comparisons between them. For 
example, 18.5% and 34.67% of all possible pairwise entries have more than 30 comparisons between them. 
Largely speaking, this figure justifies dropping infrequent ratings from the comparison. This step allows us 
to take advantage of the ability of the matrix-completion methods to deal with incomplete data. 



and where we interpret X^^ as the entry of the matrix X 
for the index pair (r, s) = uJi. Finally, let b = A{Y) be the 
values of the specified entries of the matrix Y. This idea of 
matrix completion corresponds with the solution of 



minimize rank(X) 
subject to A{X) = b. 



(3) 



Unfortunately, like the direct methods at permutation mini- 
mization, this approach is NP-hard [Vandenberghe and Boyd, 
1996]. 

To make the problem tractable, an increasingly well-known 
technique is to replace the rank function with the nuclear 
norm [Fazel, 2002]. For a matrix A, the nuclear norm is 
defined 



rank(A) 

lAL=^a,(A) 



(4) 



where at (A) is the ith singular value of A. The nuclear norm 
has a few other names: the Ky-Fan n-norm, the Schatten 
1-norm, and the trace norm (when applied to symmetric 
matrices), but we will just use the term nuclear norm here. 
It is a convex underestimator of the rank function on the 
unit spectral-norm ball {A : crmax(A) < 1}, i.e. ||A||, < 
rank(A)o"max(j4.) and is the largest convex function with this 
property. Because the nuclear norm is convex, 



minimize 

subject to A{X) = b 



(5) 



is a convex relaxation of (3) analogous to how the 1-norm is 
a convex relaxation of the 0-norm. 

In (5) we have .4(-X') — b, which is called a noiseless com- 
pletion problem. Noisy completion problems only require 
.4(-X") ~ b. We present four possibilities inspired by simi- 
lar approaches in compressed sensing. For the compressed 



sensing problem with noise: 

minimize ||x||i subject to Ax ~ b 

there are four well known formulations: LASSO [Tibshirani, 
1996], QP [Chen et al., 1998], ds [Candes and Tao, 2007] and 
BPDN [Fuchs, 2004]. For the noisy matrix completion prob- 
lem, the same variations apply, but with the nuclear norm 
taking the place of the 1-norm: 
LASSO 
minimize 
subject to 

DS 
minimize 



\\A{X)-h\\, 
11X11 <r 



1X11 



subject to Crmax(A*(A(X) — b)) < fl 



QP 



Mazumder et al. [2009] 



minimize 



\\A{X) 



\\\X\ 



BPDN Mazumder et al. [2009] 
minimize II^H^^ 
subject to ||A(-X") — bllj < <J 

Returning to rank-aggregation, recall the perfect case for 
the matrix Y: there is an unknown quality Si associated 
with each item i and Y — se"^ — es'^. We now assume that 
the pairwise comparison matrix computed in the previous 
section approximates the true Y. Given such a Y , our goal 
is to complete it with a rank-2 matrix. Thus, our objective: 



minimize 
subject to 



WAiX) 

\\x\\ 

X = 



< 2 
-X^ 



(6) 



where A{-) corresponds to the filled entries of Y . We adopt 
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the LASSO formulation because we want rank(X) — 2, and 
underestimates rank as previously mentioned. This 
problem only difTers from the standard matrix completion 
problem in one regard: the skew-symmetric constraint. With 
a careful choice of solver, this additional constraint comes 
"for- free" (with a few technical caveats). It should also be 
possible to use the skew-Lanczos process to exploit the skew- 
symmetry in the SVD computation. 

3.1 Algorithms 

Algorithms for matrix completion seem to sprout like wild- 
flowcrs in spring: Lee and Bresler [2009], Cai et al. [2008], 
Toh and Yun [2009], Dai and Milenkovic [2009], Keshavan and 
[2009], Mazumder et al. [2009], Jain et al. [2010]. Each al- 
gorithm fills a slightly different niche, or improves a perfor- 
mance measure compared to its predecessors. 

We first explored crafting our own solver by adapting pro- 
jection and thresholding ideas used in these algorithms to 
the skew-symmetrically constrained variant. However, we re- 
alized that many algorithms do not require any modification 
to solve the problem with the skew-symmetric constraint. 
This result follows from properties of skew-symmetric matri- 
ces we show below. 

Thus, we use the SVP algorithm by Jain et al. [2010] . For 
the matrix completion problem, they found their implemen- 
tation outperformed many competitors. It is scalable and 
handles a LASSO-like objective for a fixed rank approxima- 
tion. For completeness, we restate the SVP procedure in 
Algorithm 1. 

Algorithm 1 Singular Value Projection [Jain et al., 2010]: 
Solve a matrix completion problem. We use the notation 
to denote output of A{X) when A{-) is an index set. 

INPUT index set fi, target values b, target rank k, 
maximum rank k, step length rj, tolerance e 
1: Initialize X'"' = 0,t = 

2: REPEAT 

3: Set Lr(*'sf*V*'^ to be the rank-fe SVD of a matrix 
with non-zeros f2 and values 

-77(n(x(*)) -b) 

4: ^ [/W^Wv'*)^ 

5: t^t+l 

6: UNTIL [[^(XC^') -b||2 > e 



If the constraint A{X),h comes from a skew-symmetric 
matrix, then this algorithm produces a skew-symmetric ma- 
trix as well. Showing this involves a few properties of skew- 
symmetric matrices and two lemmas. 

We begin by stating a few well-known properties of skew- 
symmetric matrices. Let A — —AJ be skew-symmetric. 
Then all the eigenvalues of A are pure-imaginary and come 
in complex-conjugate pairs. Thus, a skew-symmetric matrix 
must always have even rank. Let i? be a square real- valued 
matrix, then the closest skew-symmetric matrix to B (in any 
norm) is A = [B — B'^)/2. These results have elementary 
proofs. We continue by characterizing the singular value 
decomposition of a skew-symmetric matrix. 

Lemma 1. Let A = —A^ he annxn skew-symmetric ma- 
trix with eigenvalues i\i, —iXi,i\2, —i\2, ■ ■ ■ , i\j, —i^j, where 



Xi > and j = [n/2j . Then the SVD of A is given by 
r^i 



A^U 



A.J 



(7) 



Oh 



for U and V given m the proof. 

Proof. Using the Murnaghan-Wintner form of a real ma- 
trix [Murnaghan and Wintner, 1931], we can write 

A = XTX'^ 

for a real-valued orthogonal matrix X and real-valued block- 
upper-triangular matrix T, with 2-by-2 blocks along the di- 
agonal. Due to this form, T must also be skew-symmetric. 
Thus, it is a block-diagonal matrix that we can permute to 
the form: 



Al 
-A, 



Aa 
-A, 



Note that the SVD of the matrix 



Al 
-Al 








1 




Al 






1 







Al 









We can use this expression to complete the theorem 
rAi 



X 



■0 1 

1 



1 

1 



Al 



-1 
1 



-1 
1 



Both the matrices U and V are real and orthogonal. Thus, 
this form yields the SVD of A. □ 

We now use this lemma to show that - under fairly gen- 
eral conditions - the best rank-fc approximation to a skew- 
symmetric matrix is also skew-symmetric. 

Lemma 2. Let A he an n-hy-n skew-symmetric matrix, 
and let k = 2j he even. Let Ai > A2 > . . . > Aj > A-,+1 
he the magnitude of the singular value pairs. (Recall that 
the previous lemma showed that the singular values come 
in pairs.) Then the best rank-k approximation of A in an 
orthogonally invariant norm is also skew-symmetric. 

Proof. This lemma follows fairly directly from Lemma 1. 
Recall that the best rank-fc approximation of A in an orthog- 
onally invariant norm is given by the k largest singular values 
and vectors. By assumption of the theorem, there is a gap 
in the spectrum between the fcth and fc-|-l-st singular value. 
Thus, taking the SVD form from Lemma 1 and truncating 
to the k largest singular values produces a skew-symmetric 
matrix. □ 

Finally, we can use this second result to show that the SVP 
algorithm for the LASSO problem preserves skew-symmetry 
in all the iterates X'-'^K 

Theorem 3. Given a set of skew-symmetric constraints 
A{-) = b, the solution of the LASSO problem from the SVP 
solver is a skew- symmetric matrix X if the target rank is 
even and the dominant singular values stay separated as in 
the previous lemma. 
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Algorithm 2 Nuclear Norm Rank Aggregation. The SVP 

subroutine is given by Algorithm 1. 

INPUT ranking matrix R, minimum comparisons c 

1: Compute Y from Ji by a procedure in Section 2. 

2: Discard entries in Y with fewer than c comparisons 

3: Let fi be the index set for all retained entries in Y and 
b be the values for these entries 

4: U,S,V = svp(index set SI, values b, rank 2) 

5: Compute s = {l/n)USV'^e 



Proof. In this proof, we revert to the notation A{X) 
and use A* (z) to denote the matrix with non-zeros in Q and 
values from z. We proceed by induction on the iterates gener- 
ated by the SVP algorithm. Clearly is skew-symmetric. 
In step 3, we compute the SVD of a skew-symmetric matrix: 
A* {A{X^''^) — b). The result, which is the next iterate, 
is skew-symmetric based on the previous lemma and condi- 
tions of this theorem. □ 

The SVP solver thus solves (6) for a fixed rank problem. 
A final step is to extract the scoring vector s from a rank-2 
singular value decomposition. If we had the exact matrix 
Y, then {l/n)Ye = s — {s^e)/ne, which yields the score 
vector centered around 0. Using a simple result noted by 
Langville and Meyer [forthcoming], then s — {l/n)Ye is 
also the best least-squares approximation to s in the case 
that Y is not an exact pairwise difference matrix. For- 
mally, {l/n)Ye = argmiUg ||^^ — (se"^ — es"^)||. The out- 
come that a rank-2 UHV^ from SVP is not of the form 
se^ — es"^ is quite possible because there are many rank- 
2 skew-symmetric matrices that do not have e as a factor. 
However, the above discussion justifies using {l/n)U"SV'^s 
derived from this completed matrix. 

Our complete ranking procedure is given by Algorithm 2. 

4. OTHER APPROACHES 

Now, we briefly compare our approach with other tech- 
niques to compute ranking vectors from pairwise compari- 
son data. An obvious approach is to find the least-squares 
solution mius ^.jgj^(yi,j — [si — Sj))^ . This is a linear 
least squares method, and is exactly what Massey [1997] pro- 
posed for ranking sports teams. The related Colley method 
introduces a bit of regularization into the least-squares prob- 
lem [Colley, 2002]. By way of comparison, the matrix com- 
pletion approach has the same ideal objective, however, we 
compute solutions using a two-stage process: first complete 
the matrix, and then extract scores. 

A related methodology with skew-symmetric matrices un- 
derlies recent developments in the application of Hodge the- 
ory to rank aggregation [Jiang et al., 2010]. By analogy with 
the Hodge decomposition of a vector space, they propose a 
decomposition of pairwise rankings into consistent, globally 
inconsistent, and locally inconsistent pieces. Our approach 
differs because our algorithm applies without restriction on 
the comparisons. Freeman [1997] also uses an SVD of a 
skew-symmetric matrix to discover a hierarchical structure 
in a social network. 

We know of two algorithms to directly estimate the item 
value from ratings [de Kerchov and van Dooren, 2007, Ho and 
2008] . Both of these methods include a technique to model 
voter behavior. They find that skewed behaviors and in- 
consistencies in the ratings require these adjustments. In 



contrast, we eliminate these problems by using the pairwise 
comparison matrix. Approaches using a matrix or tensor 
factorization of the rating matrix directly often have to de- 
termine a rank empirically [Rendle et al., 2009]. 

The problem with the mean rating from Netflix in Table 2 
is often corrected by requiring a minimum number of rating 
on an item. For example, IMDB builds its top-250 movie 
list based on a Bayesian estimate of the mean with at least 
3000 ratings (imdb.com/chart/top). Choosing this parame- 
ter is problematic as it directly excludes items. In contrast, 
choosing the minimum number of comparisons to support 
an entry in Y may be easier to justify. 

5. RECOVERABILITY 

A hallmark of the recent developments on matrix comple- 
tion is the existence of theoretical recoverability guarantees 
(see Candes and Recht [2009], for example). These guaran- 
tees give conditions under which the solution to the optimiza- 
tion problems posed in Section 3 is or is nearby the low-rank 
matrix from whence the samples originated. In this section, 
we apply a recent theoretical insight into matrix completion 
based on operator bases to our problem of recovering a scor- 
ing vector from a skew-symmetric matrix [Gross, 2010]. We 
only treat the noiseless problem to present a simplified anal- 
ysis. Also, the notation in this section differs slight from the 
rest of the manuscript, in order to match the statements in 
Gross [2010] better. In particular, i} is not necessarily the 
index set, i represents and most of the results are for 

the complex field. 

The goal is this section is to apply Theorem 3 from Gross 
[2010] to skew-symmetric matrices arising from score differ- 
ence vectors. We restate that theorem for reference. 

Theorem 4 (Theorem 3, Gross [2010]). Let A be a 
rank-r Hermitian matrix with coherence v with respect to an 
operator basis {Wi}1^i. Let Q, C [l,n ] be a random set of 
size > 0{nru{l + /3)(logn)^). Then the solution of 

minimize 

subject to trace{X*Wi) = tTa,ce{A*Wi) i £ Q 

is unique and is equal to A with probability at least 1 — . 

The definition of coherence follows shortly. On the sur- 
face, this theorem is useless for our application. The matrix 
we wish to complete is not Hermitian, it's skew-symmetric. 
However, given a real-valued skew-symmetric matrix Y, the 
matrix lY is Hermitian; and hence, we will work to apply 
this theorem to this particular Hermitian matrix. Again, 
we adopt this approach for simplicity. It is likely that a 
statement of Theorem 4 with Hermitian replaced with skew- 
Hermitian also holds, although verifying this would require 
a reproduction of the proof from Gross [2010]. 

The following theorem gives us a condition for recovering 
the score vector using matrix completion. As stated, this 
theorem is not particularly useful because s may be recov- 
ered from noiseless measurements by exploiting the special 
structure of the rank-2 matrix Y. For example, if we know 
Yi.j = Si — Sj then given Si we can find Sj. This argument 
QuiniBy be repeated with an arbitrary starting point as long as 
the known index set corresponds to a connected set over the 
indices. Instead we view the following theorem as providing 
intuition for the noisy problem. 
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Consider the operator basis for Hermitian matrices: 
'H = SUK:uV where 

5 = {l/\/2(e.eJ + e^ef ) : 1 < i < j < n}; 
fC = {i/V2{eieJ — BjeJ) ■ I < i < j < n}; 



V = {siBi ■ 1 < i < n}. 

Theorem 5. Let s be centered, i.e., s^e = 0. Let Y = 
se"^ — es"^ where 9 = maxi Si/{s'^s) and p = ((maxi s;) — 
(mini Si))/||s|| . Also, letQ. gT-L he a random set of elements 
with size \Q\ > 0{2nv{l + /3)(logn)'^) where v — max({n6 + 
l)/4, np^). Then the solution of 

minimize ||-X^||, 

subject to trace(X*Wi) = trace{{^Y)*W^), Wi eO. 

is equal to lY with probability at least 1 — . 

The proof of this theorem follows directly by Theorem 4 if 
lY has coherence u with respect to the basis H. We now 
show this result. 

Definition 6 (Coherence, Gross [2010]). Let A be 
n X n, rank-r, and Hermitian. Let UU* be an orthogonal 
projector onto range(A). Then A has coherence v with re- 
spect to an operator basis {Wi}"^i if both 

max.itrace{WiUU*Wi) < 2i'r/n, and 
maxi trace(sign(A) Wi)^ < ur/n'^. 

For A^iY with s^e = 0: 



UU* = — ee"^ and sign(A) — 

s-* s 



1 

—I 

n 



Let Sp £ S, Kp G IC, and Dp £ T>. Note that because 
sign(A) is Hermitian with no real- valued entries, both quan- 
tities trace(sign(A)Z)i)'^ and trace(sign(A)S'i)'^ are 0. Also, 
because UU* is symmetric, tTSLce{KiUU* Kp) = 0. The 
remaining basis elements satisfy: 

ti&ce{SpUU'Sp) = -+ < (1/n) + 6 

n 2s^ s 

1 

tvace{DpUU'Dp) = - + -!-< (1/n) + 9 
n s 



^ < (2/n)p^ 



trace(sign(A)i<'p)^ — ^ifi 



Thus, A has coherence v with from Theorem 5 and with 
respect to H. And we have our recovery result. Although, 
this theorem provides little practical benefit unless both 9 
and p are 0{l/n), which occurs when s is nearly uniform. 

6. RESULTS 

We implemented and tested this procedure in two syn- 
thetic scenarios, along with Netflix, movielens, and Jester 
joke-set ratings data. In the interest of space, we only present 
a subset of these results for Netflix. 

6.1 Recovery 

The first experiment is an empirical study of the recover- 
ability of the score vector in the noiseless and noisy case. In 
the noiseless case, Figure 2 (left), we generate a score vector 
with uniformly distributed random scores between and 1. 




10' 
Samples 



1000 5000 
Samples 



Figure 2: An experimental study of the recoverabil- 
ity of a ranlcing vector. These show that we need 
about 6n log n entries of Y to get good recovery in 
both the noiseless (left) and noisy (right) case. See 
§6.1 for more information. 



a 0.9 




0.2 0.4 0.6 0. 
Error 



0.2 0.4 0.6 0.8 1 
Error 



Figure 3: The performance of our algorithm (left) 
and the mean rating (right) to recovery the order- 
ing given by item scores in an item-response theory 
model with 100 items and 1000 users. The various 
thick lines correspond to average number of ratings 
each user performed (see the in place legend). See 
§6.2 for more information 



These are used to construct a pairwise comparison matrix 
Y — se^ — es^ . We then sample elements of this matrix 
uniformly at random and compute the difference between 
the true score vector s and the output of steps 4 and 5 of 
Algorithm 2. If the relative 2-norm difference between these 
vectors is less than 10~^, we declare the trial recovered. For 
n = 100, the figure shows that, once the number of samples 
is about 6nlogn, the correct s is recovered in nearly all the 
50 trials. 

Next, for the noisy case, we generate a uniformly spaced 
score vector between and 1. Then Y — se^ — es^ + sE, 
where E is a. matrix of random normals. Again, we sample 
elements of this matrix randomly, and declare a trial success- 
ful if the order of the recovered score vector is identical to 
the true order. In Figure 2 (right), we indicate the fractional 
of successful trials as a gray value between black (all failure) 
and white (all successful). Again, the algorithm is success- 
ful for a moderate noise level, i.e., the value of e, when the 
number of samples is larger than 6n log n. 

6.2 Synthetic 

Inspired by Ho and Quinn [2008], we investigate recover- 
ing item scores in an item-response scenario. Let Oi be the 
center of user i's rating scale, and bi be the rating sensitivity 
of user i. Let ti be the intrinsic score of item j. Then we 
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generate ratings from users on items as: 

Rij = L[ai +bitj + E^j] 
where L[a] is the discrete levels function: 

'1 Q < 1.5 

2 1.5 < a < 2.5 
L[a] = 13 2.5 < a < 3.5 

4 3.5 < a < 4.5 
,5 4.5 < a, 

and Eij is a noise parameter. In our experiment, we draw 
a, ~ iV(3, 1), b, ~ N{0.5,0.5), U ~ iV(0.1,l), and E^,j ~ 
eA''(0, 1). Here, N{fj,,a) is a standard normal, and e is a 
noise parameter. As input to our algorithm, we sample rat- 
ings uniformly at random by specifying a desired number 
of average ratings per user. We then look at the Kendall 
r correlation coefficient between the true scores ti and the 
output of our algorithm using the arithmetic mean pairwise 
aggregation. A r value of 1 indicates a perfect ordering cor- 
relation between the two sets of scores. 

Figure 3 shows the results for 1000 users and 100 items 
with 1.1,1.5,2,5, and 10 ratings per user on average. We 
also vary the parameter e between and 1. Each thick 
line with markers plots the median value of r in 50 trials. 
The thin adjacency lines show the 25th and 75th percentiles 
of the 50 trials. At all error levels, our algorithm outper- 
forms the mean rating. Also, when there are few ratings 
per-user and moderate noise, our approach is considerably 
more correlated with the true score. This evidence supports 
the anecdotal results from Netflix in Table 2. 

6.3 Netflix 

See Table 2 for the top movies produced by our technique 
in a few circumstances using all users. The arithmetic mean 
results in that table use only elements of Y with at least 30 
pairwise comparisons (it is a am all 30 model in the code 
below). And see Figure 4 for an analysis of the residuals 
generated by the fit for different constructions of the matrix 
Y. Each residual evaluation of Netflix is described by a code. 
For example, sb all is a strict-binary pairwise matrix Y 
from all Netflix users and c = in Algorithm 2 (i.e. accept 
all pairwise comparisons). Alternatively, am 6 30 denotes 
an arithmetic-mean pairwise matrix Y from Netflix users 
with at least 6 ratings, where each entry in Y had 30 users 
supporting it. The other abbreviations are gm: geometric 
mean; be: binary comparison; and lo: log-odds ratio. 

These residuals show that we get better rating fits by only 
using frequently compared movies, but that there are only 
minor changes in the fits when excluding users that rate 
few movies. The difference between the score-based residu- 
als ||f2(se"^ — es"^) — b|| (red points) and the SVP residuals 
j|f2(J7SV'^) — b|| (blue points) show that excluding compar- 
isons leads to "overfitting" in the SVP residual. This suggests 
that increasing the parameter c should be done with care and 
good checks on the residual norms. 

To check that a rank-2 approximation is reasonable, we 
increased the target rank in the SVP solver to 4 to investigate. 
For the arithmetic mean (6,30) model, the relative residual 
at rank-2 is 0.2838 and at rank-4 is 0.2514. Meanwhile, the 
nuclear norm increases from around 14000 to around 17000. 
These results show that the change in the fit is minimal and 
our rank-2 approximation and its scores should represent a 
reasonable ranking. 
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Figure 4: The labels on each residual show the 
method to generate the pairwise scores and how we 
truncated the Netflix data. Red points are the resid- 
uals from the scores, and blue points are the final 
residuals from the SVP algorithm. Please see the 
discussion in §6.3. 

7. CONCLUSION 

Existing principled techniques such as computing a Ke- 
meny optimal ranking or finding a minimize feedback arc 
set are NP-hard. These approaches are inappropriate in 
large scale rank aggregation settings. Our proposal is (i) 
measure pairwise scores Y and (ii) solve a matrix comple- 
tion problem to determine the quality of items. This idea is 
both principled and functional with significant missing data. 
The results of our rank aggregation on the Netflix problem 
(Table 2) reveal popular and high quality movies. These are 
interesting results and could easily have a home on a "best 
movies in Netflix" web page. Computing a rank aggregation 
with this technique is not NP-hard. It only requires solving 
a convex optimization problem with a unique global min- 
ima. Although we did not record computation times, the 
most time consuming piece of work is computing the pair- 
wise comparison matrix V. In a practical setting, this could 
easily be done with a MapReduce computation. 

To compute these solutions, we adapted the SVP solver 
for matrix completion [Jain et al., 2010]. This process in- 
volved (i) studying the singular value decomposition of a 
skew-symmetric matrix (Lemmas 1 and 2) and (ii) showing 
that the SVP solver preserves a skew-symmetric approxima- 
tion through its computation (Theorem 3). Because the SVP 
solver computes with an explicitly chosen rank, these tech- 
niques work well for large scale rank aggregation problems. 

We believe the combination of pairwise aggregation and 
matrix completion is a fruitful direction for future research. 
We plan to explore optimizing the SVP algorithm to exploit 
the skew-symmetric constraint, extending our recovery re- 
sult to the noisy case, and investigating additional data. 
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